Projection of Argumentative Corpora from Source to Target Languages
نویسندگان
چکیده
Argumentative corpora are costly to create and are available in only few languages with English dominating the area. In this paper we release the first publicly available Mandarin argumentative corpus. The corpus is created by exploiting the idea of comparable corpora from Statistical Machine Translation. We use existing corpora in English and manually map the claims and premises to comparable corpora in Mandarin. We also implement a simple solution to automate this approach with the view of creating argumentative corpora in other less-resourced languages. In this way we introduce a new task of multi-lingual argument mapping that can be evaluated using our English-Mandarin argumentative corpus. The preliminary results of our automatic argument mapper mirror the simplicity of our approach, but provide a baseline for further improvements.
منابع مشابه
Investigating the Social Practice of Persian Translations of ‘The Girl You Left Behind’ through Translators’ Lexical and Grammatical Strategies
The present study aimed to shed light upon the differences of social practice of Persian translations of The Girl You Left Behind written by Jojo Moyes (2012) with original text in English based on Fairclough's (1995) model. In this regard, through a careful analysis of the source and target texts, English social prac- tice instances were selected along with their Persian equivalents as the cor...
متن کاملThe SAWA Corpus: A Parallel Corpus English - Swahili
Research in data-driven methods for Machine Translation has greatly benefited from the increasing availability of parallel corpora. Processing the same text in two different languages yields useful information on how words and phrases are translated from a source language into a target language. To investigate this, a parallel corpus is typically aligned by linking linguistic tokens in the sour...
متن کاملThe Projector: An Interactive Annotation Projection Visualization Tool
Previous works proposed annotation projection in parallel corpora to inexpensively generate treebanks or propbanks for new languages. In this approach, linguistic annotation is automatically transferred from a resource-rich source language (SL) to translations in a target language (TL). However, annotation projection may be adversely affected by translational divergences between specific langua...
متن کاملCross Lingual Syntax Projection for Resource-Poor Languages
Over the past few decades, supervised learning in structured spaces has been quite successful in syntactic analysis problems in natural language processing. These learning techniques exploit large amounts of annotated data to learn models that can perform linguistic analysis on unseen data. Acquiring such supervised linguistic annotations for a language is important for natural language process...
متن کاملJoint part-of-speech and dependency projection from multiple sources
Most previous work on annotation projection has been limited to a subset of IndoEuropean languages, using only a single source language, and projecting annotation for one task at a time. In contrast, we present an Integer Linear Programming (ILP) algorithm that simultaneously projects annotation for multiple tasks from multiple source languages, relying on parallel corpora available for hundred...
متن کامل